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BOUNDING RELATIVE ENTROPY BY THE RELATIVE 
ENTROPY OF LOCAL SPECIFICATIONS IN PRODUCT SPACES 



Katalin Marton 
Alfred Renyi Institute of Mathematics, Hungarian Academy of Sciences 

Abstract. For a class of density functions q n (x n ) on R n we prove an inequality 
between relative entropy and the sum of average conditional relative entropies of the 
following form: For any density function p n (x n ) on R n , 

D{ P n \\q n ) < 

n 

c<mat.^E£>( Pi (.|y 1 ,...,y i _ 1 ,y i+1 ,... ) y n )||Q i (.|yi,...,y 4 _i,y i+ i,...,y n )), 
i=i 

where Pi(-\yi, ■ . . ,Vi~i,yi+t, • ■ • ,y n ) and Qi(-\xx,. . .,Xi-i,x i+ i, . . .,x n ) denote the 
local specifications for p n resp. q n , i.e., the conditional density functions of the i'th 
coordinate, given the other coordinates. The constant depends on the properties of 
the local specifications of q n . 

The above inequality implies a logarithmic Sobolev inequality for q n . We get an 
explicit lower bound for the logarithmic Sobolev constant of q n under the assumptions 
that: 

(i) the local specifications of q n satisfy logarithmic Sobolev inequalities with constants 
pi, and 

(ii) they also satisfy some condition expressing that the mixed partial derivatives of 
the Hamiltonian of q n are not too large relative to the logarithmic Sobolev constants 
Pi- 
Condition (ii) may be weaker than that used in Otto and Reznikoff 's recent paper 

on the estimation of logarithmic Sobolev constants of spin systems. 
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BOUNDING RELATIVE ENTROPY 



1.1 The result 

Let (X, d) be a Polish space where we shall work with the Borel cr-algebra. Let 
X n denote the n-th power of the Borel space (X,d), considered with the Borel 
cr-algebra. Let us fix a probability measure on X n , given by the density 

q n {x n ) = (exp(—V(x n )) (with respect to some product measure A n = J^Ai). 

In the sequel we shall not distinguish between probability measures and their den- 
sities. 

We shall use the following 
Notation: 

• For x n = (xi, X2, ■ ■ ■ x n ) G X n and 1 < i < n, 

x% = (xj ■■ j ^ i), x l = (xj : j < i), = (xj :i < j < n); 

• q n : a fixed probability measure on X n ; 

• X n : random sequence in X n , dist(X n ) = q n ; 

• p n : another density function on X n ; 

• Y n = Y n (l) : random sequence in X n , dist(y n ) = p n ; 

• Pii'lVi) = dist(Yi\Yi = iji) (1 < i < n) : conditional density functions 
consistent with p n ; 

• p % := dist^), p l = dist(r i ), pf(-\y l ) = dist(>7|:r = y l ). 

• Qi(xi\xi), 1 < i < n, Xi G X the local Xi G X n ~ x : the local 
specifications of q n : Qi(-\xi) = dist(A A i |A > i = xi) (l<i<n). 

Definition. For measures p and Q on X, we denote by D(p\\Q) the relative entropy 
(called also informational divergence) of p with respect to Q: 

D(p\\Q)= [ log^\dp(x)= [ \ogp(x)?P-d\(x) (1.1) 

if p « Q, and oo otherwise. If Y and X are random variables with values in X 
and distributions p resp. Q, then we shall also use the notation D(Y || X) for the the 
relative entropy D(p\\Q). Formula (1.1), with X replaced by X n , defines relative 
entropy D(p n \\q n ) for measures p n , q n on X n . 

To formulate the main result of this paper, we also need the concept of average 
(conditional) relative entropy: 
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Notation. If we are given a probability measure tt = dist(£/) and conditional 
distributions p(-\U = u) = dist(Y\U = u), q(-\u) = dist(X|L r = it) then for the 
average relative entropy 

E n D(Y\U = u\\X\U = u) 
we shall use either of the notations 

D(Y\U\\X\U), D(p(.\U)\\X\U), D(Y\U)\\q(-\U)), D(p(.\U)\\q(-\U)). 



Our goal is to prove an inequality of the form 

n 

D(p n \\q n ) < c(q n ) ■ Y.DipMY^QMYi)) (1.2) 

i=i 

for a fixed measure q n , and any p n , under some conditions of weak dependence 
to be specified later. Here c(q n ) denotes a constant depending on q n , but not on 
p n . I.e., we want to bound D(p n \\q n ) by the sum of the "single phase" conditional 
entropies D{p i {-\Y i )\\Q i {-\Y i )). Since D{p i {-\Y i )\\Q i ■ ]%)) measures in a way how 
different the conditional distributions (pi(-\yi) and Qi ■ \yi) are, we can conclude 
from the closeness of local specifications to closeness of p n and q n . Moreover, such 
an inequality ensures that upper bounds for the "single phase" relative entropies 
D(pi{-\yi)\\Qi{-\yi)) that hold uniformly in yi yield a bound for D(p n \\q n ). This is 
a way to get logarithmic Sobolev inequalities for measures on product spaces. 

Note that there does not hold any inequality of type (1.2) in general. 

To state the appropriate conditions for (1.2) we need the concept of quadratic 
Wasserstein distance. 

Definition. The quadratic Wasserstein distance, or VF-distance, between the prob- 
ability measures p and and Q on X is defined as 

W(p, q)=M[E 7T d 2 (Y,X)] 1 / 2 , 

IT 

where Y and X are random variables with laws p resp. Q, and infimum is taken 
over all distributions n = dist(Y, X) with marginals p and Q. 

Definition. We say that the distribution Q on X satisfies a distance-entropy in- 
equality with constants pi if 

W 2 (p,Q)<- p D(p\\Q) (1.3) 
for all probability measures p on X. 



4 



BOUNDING RELATIVE ENTROPY 



Distance-divergence inequalities were introduced by Marton [Ml], [M2], [M3] 
for the case of the (non-quadratic) Wasserstein distance derived from Hamming 
distance. For the quadratic Wasserstein distance derived from Euclidean distance 
the first distance-divergence inequality was proved by Talagrand [T], for Gaussian 
distributions. It was generalized by Otto and Villani [O-V] for measures satisfying 
a classical logarithmic Sobolev inequality. Otto and Villani's paper (and Villani's 
book [V]), called the attention to the deep connection between quadratic distance- 
divergence inequalitys and logarithmic Sobolev inequalities which also plays a role 
in the present paper. 

Throughout this paper we consider measures q n (x n ) = exp( — V(x n )) whose i-th 
local specification satisfies a distance-divergence inequality with constant pi. 

With these numbers pi in mind, we shall consider the following distance on X n : 



n 

J2Pi-d(xi, yi ) 2 , x n ,y n eX n . 
i=i 



Now we formulate the conditions we need to derive an inequality of form (1.1) 
for the measure q n = exp(— V(x n )). 

Definition. 

We say that the system of local specifications of the probability measure 
q n (x n ) = exp(— V(x n )) satisfies the distance-entropy bound with constants pi if: 
For every i, any sequence yi and any density function r on X 

W 2 (r,Qi(.\yi)) < -D{r\\QMm))- (1-3 (DE)) 

Pi 



To formulate the second condition, fix a sequence ( n G X n , and define the 
functions $ c , * = # c : {X n f ^ R by 

n 

® C (t n 1 y n ) = J2v(C-\t il y?) 1 
i=i 

and 

n 

^(t n ,y n ) = J2v(y i - 1 ,t i ,Q), 

i=l 



S"\x",y") = 
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Definition. 

We say that the density function q n (x n ) = exp(— V(x n )) satisfies the sub-quadratic 
bounds with constants pi and 5 if for every quintuple (C n , t n , u n , y n , z n ) G (X n ) 5 

<S> c (t n , y n ) - $ c (r, z n ) - $ c (u n , y n ) + $ c (u n , z n ) 

-I c 

< — — -S n \t n ,u n ) -d {n \y n ,z n ), (1.4(SQ1)) 

and 

^{t n , y n ) - # c (£ n , z n ) - ^(u n , y n ) + * c (w n , z n ) 

< l^l.d^(t n ,u n )'S n) (y n ,z n ), (1.5(SQ2)) 

With a less compact notation (1.6(SQ1)) and (1.6(SQ2)) can be written as 

n 

[V(C-\U, yf) - V(C-\U, z?) ~ V{C-\u u y?) + V(C~\ Ui , zf)] 

i=i 

< i_i . d (»)(f» u ») ■ d<">(y» *»), (1.6(SQ1)) 

and 

n 

^ [y^- 1 , *«, CD - ^(z*- 1 , * 4j CD - v&-\ ^ CD + Ui , 

< iZL* . w n ) • rf (n) (y n , z n ). (1.7(SQ2)) 

We shall use the following comprehensive short-hand 
Notation. 

The probability measure q n (x n ) = exp(— l/(x n ))satisfies condition DE(pi)&cSQ(pi, 6) 
(distance-entropy & sub-quadratic bounds) if the distance-entropy and the sub- 
quadratic bounds hold with constants pi and 5. 

Our main result is 
Theorem 1. 

Assume that the local specifications Qi(-\xi) satisfy conditions 
DE( Pl )kSQ( Pl ,5). Then 

D(p n \\q n ) < ( Ml ^ /2) ) • j^D(pMm\QMYi)) (1-8) 
for any probability distribution p n on X n . 



We believe that Theorem 1 is true without the factor 1 — 5/2 in the denominator: 
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Conjecture. 

If the local specifications Qi(-\xi) satisfy conditions DE(pi)8zSQ(pi, 5) then 

1 

D{p n \\q n ) < -.Y,D(pMmQMYi))- 



By Theorem 1, any upper bound for the "single phase" relative entropies 
D{pi(-\y~i)\\Qi{-\y~i)) that holds uniformly in y~i, yields a bound for D(p n \\q n ). This 
is a way to get logarithmic Sobolev inequalities for weakly dependent random vari- 
ables. 

1.2 Classical logarithmic Sobolev inequalities. 

If p n and q n are probability measures on the same Euclidean space W 1 then 
I(p n \\q n ) will denote the Fisher information of p n with respect to q n : 

I(p n h n )= Vlog P [X p n (x n )d\ n (x n ), if log^T is smooth p n -a.e. 
J Rn q n (x n ) q n (x) 

Definition. The density function q n satisfies a classical logarithmic Sobolev in- 
equality with constant p > if for any density function p n on M n , with \og(p n /q n ) 
smooth p n -a.e., 

D{p n \\q n ) < ^I(p n \\q n )- 
2p 

The classical logarithmic Sobolev inequality for q n can be used to control the rate 
of convergence to equilibrium for the diffusion semigroup associated with q n , and 
is equivalent to the hypercontractivity of this semigroup. The prototype is Gross' 
logarithmic Sobolev inequality for Gaussian measure which is associated with the 
Ornstein-Uhlenbeck semigroup [Gr], [N]. Another use of logarithmic Sobolev in- 
equalities is to derive transportation cost inequalities, a tool to prove measure 
concentration (F. Otto, C. Villani [O-V]). The classical logarithmic Sobolev in- 
equality for spin systems is equivalent to the property called "exponential decay of 
correlation"; for this concept we refer to Bodineau and Helffer [B-H] and Helffer 
[H]. 

In this subsection we apply Theorem 1 to prove a logarithmic Sobolev inequality 
for measures on IR n with positive density 

q n (x n ) = dist(X n ) = exp(-F(x n )), x n E IF, 

under the assumption that the local specifications 

Qi(-\xj,j ^ i) = dist(Xj|A > i = Xi) 
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satisfy logarithmic Sobolev inequalities with constants pi independent of x n , to- 
gether with some other condition expressing that the mixed partial derivatives of 
V are not too large relative to the numbers pi. We want a logarithmic Sobolev 
constant independent of n. 

Much work has been done on this subject. The first results were obtained for 
spin systems with finite, and somewhat later for compact, phase spaces by J-D. 
Deuschel and D. W. Holley [D-H], B. Zegarlinski [Zl], D. Stroock and B. Zegarlinski 
[S-Zl], [S-Z2], [S-Z3], L. Lu and H. T. Yau [L-Y] and others. (In the case of 
finite phase space another definition is needed for the Fisher information, and thus 
for the logarithmic Sobolev inequality.) In [S-Z3] Stroock and Zegarlinski prove 
equivalence between logarithmic Sobolev inequality and Dobrushin and Shlosman's 
strong mixing condition in the case of finite phase space. In [Zl], Zegarlinski gives 
an explicit bound to the logarithmic Sobolev constant, under conditions reminiscent 
of, but in fact quite different from, Dobrushin's uniqueness condition. 

The study of the non-compact case started later. The first two essential results 
are combined in the next Theorem which is Bakry and Emery's celebrated convexity 
criterion [Ba-E], supplemented by Holley and Stroock's perturbation lemma [Ho-S]: 

Theorem of Bakry and Emery + Holley and Stroock. 

Let q n (x n ) = exp(— V(x n )) be a density function on M fc , and let V be strictly convex 
at oo, i.e., V(x n ) = U(x n ) + K(x n ), where 

Hess(U)(x n ) = (d lJ U(x n )) >c-I n 

for some c > (with I n the identity matrix), and K(x n ) is bounded. Then q n 
satisfies a logarithmic Sobolev inequality with constant p, depending on c and \\K\loo: 

p> c • exp(-4||K|| 00 ). 

In particular, if V is uniformly strictly convex, i.e., Hess(V) > c ■ I n , then p > c. 

We also recall the very important fact that a product distribution admits a log- 
arithmic Sobolev inequality with constant p, provided the factors have logarithmic 
Sobolev constants > p. 

In particular, a product distribution where all factors are uniformly bounded 
perturbations of uniformly log-concave distributions, admits a logarithmic Sobolev 
inequality with a controllable constant. The simplest case beyond this is when the 
local specifications are uniformly bounded perturbations of uniformly log-concave 
distributions, but there is a weak dependence between the coordinates. This was 
the case investigated by B. Helffer [He], B. Helffer and Th. Bodineau [Bod-He]. 
(See also M. Ledoux [L2], N. Yoshida [Y] and Chapter 5 in A. Guionnet and B. 
Zegarlinski [Gui-Z].) 
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[He] and [Bod-He] prove the existence of a positive logarithmic Sobolev constant 
under (more or less) the conditions of Proposition 1 below. Their results do not 
say much about how small the mixed partial derivatives of V should be relative to 
the logarithmic Sobolev constants of the Qi(-\xiYs, nor do they provide any explicit 
lower bound on the logarithmic Sobolev constant of q n . Our aim at the writing of 
the first few versions of this paper was to improve on earlier results in this respect. 

After an earlier version [M6] of the present paper was written, the paper by F. 
Otto and M. Reznikoff [O-R] appeared that does already contain an explicit bound 
for the logarithmic Sobolev constant, depending essentially on the same parameters, 
and this bound is tight in some cases. Later in this section, we compare their result 
with ours. 

To formulate a sufficient condition for DE(pi)&tSQ(pi, 5), define the triangular 
function matrices Bi and B 2 as follows: For y n ,z n ,rj n E W 1 put 

and define 

B 1 (y n ,rj n ,z n ) = {[3 hk (y n , V n ,z n )) i<k , B 2 (y n , rj n , z n ) = (A >fc (y», V n , z n )) . >fc . 



Definition. Assume that the local specifications Qi(-\xi) satisfy logarithmic Sobolev 
inequalities with constants pi. We say that the system of local specifications 
(Qi(-\xi)) satisfies the contracivity condition for partial derivatives with constants 
Pi and 5 if 

1 



sup 

y n ,T] n ,z r ' 



B 3 (y n ^ : z n ) 



<-(!-<*), i = 1,2. 



(C) 



Thus the smaller the logarithmic Sobolev constants of the local specifications, 
the stronger the constraint on the mixed partial derivatives. 

For the important special case when the mixed partial derivatives are constants 
Pi t k, the matrices B±, B 2 are numerical. Furthermore, we can use the following 
estimates: Denoting 

sup x „ \di, k V(x n )\ 



and writing 
we have 



Ai = (o!i,fc)i<fc, A 2 = (aa,k)i>k, 
\\Bj\\ < \\AjW, j = 1,2. 



We shall use the following short-hand 
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Notation. 

A system of local specifications (Qi(-\xi)) satisfies condition LSIhC{pi,8) if the 
local specifications Qi(-\xi) J s satisfy logarithmic Sobolev inequalities with constants 
Pi, and also the contracivity condition for partial derivatives (C) with constants pi 
and 5. 

Proposition 1.1. 

The LSI8zC(pi,5) condition implies the DE(pi)8zSQ(pi, 5) condition, consequently 

D( P n \\q n ) < ( Ml ^ /2) ) ■p^D(p i (.\Y i )\\Qi(-m (1.9) 

for any probability distribution p n on W 1 . 

Theorem 1 and Proposition 1.1 imply the following 
Corollary. 

If the local specifications Qi{-\xi) satisfy condition LSIhC{pi, 5) then for any den- 
sity p n , 

D ^ n ^ (r^hw)) ■ h m) ' (L10) 

where p = minpj. 

Remark. Note that (1.9) does not contain the Fisher information, so it is concep- 
tually much simpler than (1.10). 

Proof of Proposition 1.1. 

The basic tool in this proof is a result by Otto and Villani [O-V], establishing a 
deep connection between the logarithmic Sobolev inequality and a transportation 
cost inequality for quadratic Wasserstein distance. It holds in W 1 and even on 
manifolds, but we only need it on R. 

Theorem of Otto and Villani in dimension 1. [O-V], [B-G-L] 

If the density function q(x) on R satisfies a classical logarithmic Sobolev inequality 

with constant p then for any density p on K. 

W 2 (p,q)<--D(p\\q). 
P 

Assume that the local specifications Qi(-\xi) satisfy LSI&zC(pi, 5). By the Otto- 
Villani theorem this implies the distance-entropy bound (1.3). 

To verify the sub-quadratic condition, observe that 

$ c (w n , t n ) - $ z eta(v n , t n ) - $ c (w n , z n ) + $ c (t/\ z n ) 



< sup 



—^—d u t .$ c (u n ,t n ) 



.S n \u n ,v n )-d( n \t n ,z n ). 
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Since 

the sub-quadratic bound (1.4(SQ1)) follows from the contractivity bound for par- 
tial derivatives. The sub-quadratic bound (1.5(SQ2)) can be proved similarly. 
□ 

Otto and Reznikoff 's Theorem 1 in [O-R] gives a logarithmic Sobolev inequality 
of form very similar to the above Corollary. They assume a contractivity bound 
for partial derivatives, using a matrix A defined by putting together the triangular 
matrices Ai and A 2 defined above (consisting from bounds to absolute values of 
mixed partial derivatives) . The Otto- Reznikoff condition may be stronger than our 
LSI$zC(pi, 5) condition. This is the case when there are both positive and negative 
entries in the matrices Bj. Thus there are Gaussian distributions for which the 
above Corollary does apply, but the Otto-Reznikoff theorem does not. On the 
other hand, in some other cases the Otto-Reznikoff theorem gives a slightly better 
bound. This is the case for Gaussian distributions, where all the mixed partial 
derivatives of V(x n ) are non-negative. In this case the Otto-Reznikoff bound is 
tight, while ours is not. If the conjecture formulated after Theorem 1 turns out to 
be true then it shall give a bound that is always better than Otto and Reznikoff 's. 

Note that Theorem 1 does not contain the Fisher information. We used this 
abstract form, for we hope that our method of proof might give a pattern in other 
settings where a different notion of Fisher information may be needed. 

1.3 On the method 

Our proof of the Theorem is quite different from the approach taken by Bodineau 
and Helffer, and also from the approach by Otto and Reznikoff. 

We use a discrete time interpolation connecting the distributions p n and q n , 
which is a modification of the Gibbs sampler for q n . It may seem somewhat artifi- 
cial, but we could not find any simpler interpolation doing the job. The difficulty 
is that although the LSI8zC(pi,5) condition ensures contractivity of the Gibbs 
sampler with respect to Wasserstein distance, for the proof of Theorem 1 contrac- 
tivity with respect to relative entropy would be needed. By Lemma 5.1 below we 
circumvent this difficulty. 

In [M3, Theorem 2] (see also [M4]) we considered distributions q n satisfying 
conditions similar to LSIfoC(pi, 5), and proved a transportation cost inequality. 
In view of the Otto-Villani theorem, this is weaker than a logarithmic Sobolev 
inequality. But the contractivity condition for partial derivatives in [M2] is weaker 
and more natural than the condition in the present paper. Furthermore, in [M2???] 
we also considered the more general case of local specifications Qi(-\xi), where I 
runs over a collection of (small) subsets of [1, n], and Qi{-\xj) is the joint conditional 
density function of the random variables (Xi : % e I), given the values (Xj : j ^ I). 
We did not aim at this generality in the present paper. 
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1.4 On the limits of the method. 

The LSISzC(pi,8) condition depends on the system of coordinates. This is a 
serious drawback, although natural in the case of spin systems. Moreover, the 
LSI&C(pi, 5) condition also depends on the ordering of coordinates (see later). 

Because of the dependence on the system of coordinates, there are important 
families of distributions q n (x n ) = exp(— V(x n )) (with n growing) that admit a log- 
arithmic Sobolev inequality (with a constant independent of n), without satisfying 
an LSI8zC(pi, 6) condition. In fact, this is the case with many convex quadratic 
functions V(x n ), e.g., 

n n 

V (x n ) = 1/2 -^x? + l/2 • -M) 2 + const., M eR fixed. 

i=i i=i 

(The results by Bodineau and Helffer, and those by Otto and Reznikoff share 
this problem.) In a paper on conservative spin systems Landim, Panizo and Yau 
[L-P-Y] proved logarithmic Sobolev inequality for the following class of densities 
exp(-V(x n )): 

n 

V(x n ) = V (x n ) + J2<l>(xi), 

i=l 

where : M. \— > R is bounded, and has bounded first and second derivatives. It 
would be nice to find a common way to prove our perturbative theorem and the 
theorem of Landim, Panizo and Yau [L-P-Y] , but so far these two directions could 
not be united, and in fact [L-P-Y] has the only non-perturbative result for the 
non-compact case. 

The definition of the 5-contracivity condition for partial derivatives is not very 
transparent, partly because it depends on the ordering of the index set. If the 
indices are nods in a lattice in a Euclidean space, and if V is sufficiently symmetric, 
then the following consideration may help. The definition of the local specifications 
can often be extended in a natural way to infinite sequences y = (yi) indexed by 
the nods of the entire (infinite) lattice. Let us consider the lexicographical ordering 
on the nods (i.e., on the index set). For every nod i, the symmetry with center i 
is a bijection between the nods precedeing resp. following i, and it often happens 
that if nods j and k are interchanged by this bijection then 

Pi,j(y,V, z) = fii,k(z*,rf,y*), 

where y* denotes the sequence defined as follows: 

y* = y^ yt = yj ^ 3 an d k are interchanged by the bijection. 

With this ordering of the indices it is often possible to give effective bounds for 
ll-Bill and Ili^H- 
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Recently several extensions of the logarithmic Sobolev inequality were invented 
for generalized entropy functionals. See works by B. Zegarlinski [Z3], S. Bobkov and 
B. Zegarlinski [B-Z], C. Roberto and B. Zegarlinski [R-Z], F. Barthe, P. Cattiaux 
and C. Roberto [Bar-C-R], D. Cordero-Erausquin, W. Gangbo and C. Houdre [CE- 
Ga-Hou]. In these works, relative entropy and Fisher information are replaced by 
more general functionals. In [Z3] and [B-Z] some of these more general inequalities 
are also proved for Gibbs measures (without explicit bounds). At this moment it is 
not clear whether the methods of this paper can be extended to prove results like 
those in [Z3] and [B-Z]. 



2. An auxiliary theorem for estimating relative entropy. 

In this section we prove the following 

Auxiliary Theorem. Let X n be a random sequence with local specifications Qi{-\xi), 
and let (Y n (t) : t = 0, 1, . . .) be a discrete time random process in X n . Then: 

(i) For any s > 

D(Y n (0)\\X n ) 

s n 

<EE £, ( y *(*)i y< " 1 (*)' 1 7 l (* +1 )ii^(-i y< " 1 (*)' 1 7 l (* +1 )) 

t=0 i=l 

+ D(Y n (s + l)\\X n ). (2.1) 
or, equivalently 

D(Y n (s)\\X n ) - D(Y n (s + l)\\X n ) 

n 

<J2D{Y i (s)\Y i -\ S ),Yr(s + l)\\Q i ^Y i - 1 ( S ),Y l n (s + l)). (2.1') 

i=l 

(ii) If, in particular, 

lim D(Y n (s) || X n ) = along some subsequence (2.2) 

then 

D(Y n (l)\\X n ) 

oo n 

^ E E D ( y ^)i yi_1 (*)' Y ^ + !) m-T-'it), y?{t + 1))). 

t=o *=i (2.3) 
(Hi) If the sequence (Y n (t)) is the Gibbs sampler, i.e., for all t > and 1 < % < n 

distmt + m^it) = y i -\t), y?(t + 1) = y ?(t + 1)) = Qi{¥-\t),y?{t + 1)), 

(2.4) 
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then (2.1) holds with equality. Thus in this case 

oo n 



t=0 i=l 



(2.5) 



Remark 1. 

A frequently used tool in bounding relative entropy is the decomposition 

n 

D(Y n \\X n ) = J2D(Y l \Y*- 1 \\q l (-\Y 1 - 1 )), 

i=i 

where = dist(Xj|X i_1 = This decomposition has the draw- 

back that D(Y i \Y l ~ 1 \\q i (-\Y l ~ 1 ) cannot easily be bounded by a relative entropy 
with respect to some conditional density Qi(-\-)- The Auxiliary Theorem bounds 
D(Y n \\X n ) by an infinite sum of relative entropies, all with respect to some Qi(-\-). 

Remark 2. 

For given distributions (dist(y n (t)), t = 0, 1, 2 . . .) there are many joint distribu- 
tions dist(y n (t), Y n (t + 1)). The terms in the sum (2.1) do depend on these joint 
distributions, but only through the joint distributions dist(Y l (t), Y™(t + 1)). 

Proof of the Auxiliary Theorem. . 

To prove (2.1) for s = 0, we iterate the following inequality: For any random 
sequence Y n and random variable Z n , with an arbitrary joint distribution, 

D(Y n \\X n ) < D{Y n \\Y n - 1 ,Z n ) + D(Y n -\Z n \\X n ). (2.6) 

This holds because 

D(Y n \\X n ) = D(y n - 1 ||X n - 1 ) + D(Y n \Y n - 1 \\Q n (-\Y n - 1 )) 

< J D(y n |y"- 1 ||Q n (.|y- 1 )) + J D((y"- 1 ,z n )||x-). (2.7) 

By recursion on i, (2.7) implies (2.1) for s = 1: 
D(Y n (0)\\X n ) 

n 

< ^Z?(l-(0)|l— 1 (l),^(l)||Q i (-|^- 1 (0),^(l)) +D(Y n (l)\\X n ). 
i=i (2.8) 

From (2.8) we get (2.1) by another recursion: on s. 

To prove (iii), note that if (2.4) holds then (2.7) holds with equality, and so does 
(2.8). □ 

The Auxiliary Theorem is fairly easy to prove, but to use it we need a process 
(y n (t), t = 1,2,...) that admits good estimates for the terms in the sum (3.3), and 
also satisfies (3.2). The construction and analysis of such a process is the subject 
of the rest of the paper. 
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3. The Entropy-Distance inequality. 

An important tool in the proof of Theorem 1 is the following notion: 
Definition. 

We say that the system of local specifications of q n satisfies the entropy-distance 
bound with constants pi and 5 if: 

For any quadruple of sequences (y n (l), y n (2), z n (l), z n {2)) e (A' n ) 4 

n 

^D(Q l (-|^- 1 (l), 2/ r(2))||Q l (-k l - 1 (l),^(2))) 

i=i 

<^-^-[d {n \y n (i),z n (i)) + S n \ y n (2),z n (2))} 2 . 

8 (3.1 (ED)) 



In this section we prove the following 



Lemma 3.1. 

If the system of local specifications of q n satisfies condition DE(pi)tkSQ(pi, 5) then 
it satisfies the entropy- distance bound with the same constants. 



Proof. 

Define the function F : {X n f ^ R by 



F(y n ,e n ,u n ) = J2v(y i - 1 ,0 i ,u?). 
i=i 

By the identity 



F(y n , 6> n , u n ) - F(z n , 9 n , v n ) - F(y n , r n , u n ) - F(z n , r n , v n ) 

= $ y (e n , u n ) - ® y (e n , v n ) - ® y (T n , u n ) - $ y (T n , v n ) 

+ y v (9 n , y n ) - V v (9 n , z n ) - ^(r n , y n ) - ^(r n , z n ), 

valid for all sixtuples (y n , z n , u n , v n , 6 n , r n ) G (X n f, the bounds (1.4(SQ1)) and 
(1.5(SQ2)) imply 

F(y n , n , u n ) - F{z n , n , v 11 ) - F{y n , r n , u n ) - F(z n , r n , v n ) 

< . d M (0n r n ) . ^(n) ( y n ^ + rf (n) _ (3^ 
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We also have the following series of identities: 

n 

j2D(QiW-\i), y m)\\Qi(v-\i),z?m 
i=i 

n 

+ J]D(Q l (-|^- 1 (l),^(2))||Q l (-|2/ l - 1 (l),2/r(2))) 

i=i 

= E / ^(^'^" 1 (i)-i/r(2))-io g ;, , ; • - : */.\ / 

n „ 

/ Qi^i^a), z?(2)) ■ log ^ v; L^-i at.Xo*^ ^fo) 

i=l * 

n 



i=l 



log 



Qi(<i|y'- 1 (i),y?(2)) 

<9i(t i |z i - 1 (l),z?(2))' 

Q i (t t |^- 1 (l),^(2) 
g i (t i |y*-i(l),yj»(2))' 

Q < (g < |y*- 1 (l) > y?(2)) Q^l^l), yf(2)) 

Q i (fl i |z*- 1 (l),z?(2)) g Qi(r 4 |z*- 1 (l),z?(2)) 



(3.3) 



where 6> n and r n are random sequences with independent components and dis- 
tributed according to 



dist(^) = QiW-\l), y?(2)), distfo) = QiC-I^Cl), *?(2)). 



By conditional independence of the coordinates, we can define the joint distri- 
bution dist(6> n , r n ) so as to achieve 

n 

E(d^) 2 (e n , r n ) = J> • W 2 m-W~\^ ^(2)), QiUz*- 1 ^), z?(2))). (3.4) 



Let us introduce the notations 

n 

D y = J2D{QiW-\l),y?(2))\\Q i (-\z i -\l),zn2))), 
i=l 

n 

D z = J2D(Q i ^z i -\l),zU2))\\Qi(-\y i - 1 (l),y^2))). 



i=l 



Since 



E 



i=i 



log 



Qi(tf<|y- 1 (l),y?(2) 



log 



Q l (r l |y l - 1 (l),yr(2)) 



Q i (tf i |z*- 1 (l),z?(2)) to Q l (r l |^- 1 (l),^(2)) 
= F(y n (l),9 n , y n (2)) - 0», z n {2)) 

- F(y»(l), r», y n (2)) + r", z"(2)), 
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(3.2), (3.3) and (3.4) imply that 

r- — D y + D z 



^^•^ 2 (^(-i^- 1 (i),2/r(2)),^(-i^- l (i)^r(2))) 

\ 1=1 



1-5 

< 

~ 4 

• [S n \y n (l),z n (l)) + dW(y n (2),z n (2))]. (3.5) 
By the distance-entropy bound 

n 

J2Pi-W\QiW-\l), y n^),Qi(V-\l),z^2)))<2-mm{D y ,D z }. (3.6) 
i=i 

Substituting (3.6) into (3.5) and taking squares: 

D y ■ D z < ■ mm{D y , D z } • [d<»V(l), z n (l)) + d<» V(2), z n {2))] \ (3.7) 

i.e., 

mzx{D y ,D z } < ^^.[ d ( n \y n (l),z n (l))+d^(y n (2),z n (2))] 2 . □ 



Note that conditions (1.3(DE)) and (3.1(ED)) imply a strong form of contrac- 
tivity: For any quadruple of sequences (y n (l), y n (2), z n (l), z n {2)) 

n 

^p l -W 2 (Q l (> l - 1 (l),<(2)),Q l (.|f- 1 (l),^(2))) 
i=i 

< 1(1-^. [d(»)(y»(l), z »(l)) + d(»)( y »(2),z»(2))] 2 . 

1 (3.8 (CO)) 



Condition (3.8(CO)) is somewhat stronger than the usual contractivity condi- 
tion, which is the same thing with y n (l) = y n (2) and z n (l) = z n (2), and which can 
be considered as a version of Dobrushin's uniqueness condition [D]. 

It is well known that the usual contractivity condition, and thus condition (CO), 
implies the existence and uniqueness of a probability measure q n compatible with 
the given local specifications. This will be the case throughout the paper. 
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4. The interpolation processes 

Let us define a Markov chain 

(Y»(f):f = 0,l,...) 

as follows: Let dist(Y n (0)) =p n , and define the conditional distribution 
dist(Y n (t + l)\Y n (t) by the Markov kernel G, where 

Definition. 

The Markov operator G (on the probability measures on IR n ) is defined by the 
transition function 

n 

G(v n \u n ) = l[Q i (v i \v i - 1 ,u?). (4.1) 

If 7r n is a probability measure on IR n then n n ■ G shall denote the image of n n under 
G: 

n n .G{-) = [ G(-\u n )p n (u n )d\ n (u n ). 



Note that if a density function q n has local specifications Qi then it is invariant 
with respect to G. 

Sometimes we shall denote T n (0) by Y n . 

(Y n (t)) is a variant of the Gibbs sampler for q n . It is important that it satisfies 
(2.4) and, consequently, (2.5). 

The inequalities (2.1) and (2.3) of the Auxiliary Theorem will be applied not to 
the process (Y n (t)) but to the (hidden Markov) process 

(Z n (t) :f =1,2,...), 

defined as follows: Let 

dist(Zi(t + l)\Y n (t) = y n (t),Y n (t + 1) = y n (t + 1)) 
^(•l^-^^ + l)) (f>0), (4.2) 

and let 

dist((y,(t), Z t (t + 1)) | Y n (t) = y n (t), Y n (t + 1) = y n (t + 1)) 
be that coupling of the distributions 

dist(y l (t))|y l - 1 (t) = y- 1 ^), + 1) = + 1)), QiW-^t)^ + 1)), 
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that achieves 

^(distO^t))!^- 1 ^) = y'- 1 ®, Y?{t + 1) = yf(t + l)),Qi(-|y i_1 (*),y?(* + !))) 

(4-3) 

for every value of the conditions. Thereby we have defined 
dist(Zi(t + l)\Y i (t),Y? l (t + l)), and we assume that 

Zi (t+ 1)^(^)^ + 1)) 

— > everything else in the process (Y n (s), s ^ t, t + 1) and (Z n (s), s^Hl) 

(4.4) 

Let us extend dist(y n , Y n (l), Y n (2), . . . ) to dist(Y' n , Y n , Y n (l), Y n (2), . . . ) as 
follows. We define dist(Y ,n , Y n ) so that 

dist(y'\>7) =p n for all i, (4.5) 

and 

Y'^iY'^.Y^^Y 1 . (4.6) 

It is easy to see by recursion that this can be done. (Indeed, if for some i dist( Y /l_1 , Y n ) 
is already defined then we define dist(Y'\ Y n ) by relations 

dist(y / i |y /i_1 , Y?) = Pl (-\Y' l ~\ Y?) and Y\ -> (Y n ~\Y t n ) -> Y\) 
We shall apply the Auxiliary Theorem to the sequence 

(y /n (o),z n (i),z n (2)...,z n (t)...). 

(It is easier to deal with the joint distribution dist(Y /n , Z n (l)) than with dist(Y n , Z n (l)). 
Note that Y' n -> Y n -> Z n (l).) 

In order to use the Auxiliary Theorem for the process (Z n (t)), we need good 
bounds for 

n 

D t = Y J D{Z i {t)\Z i -\t),Z?{t + lm^-'it), Z?(t + 1)), t > 1 (4.7) 
i=i 

and 

n 

D = X;i>(y , i|y ,i " 1 ,^(l)||Qi(-|l r,i " 1 ,2?(l)), (4-8) 
1=1 

and need to prove that 

lim D{Z n {t)\\q n ) =0. (4.9) 
We shall bound D t by E t , the counterpart of for the process (Y n (t)): 

n 

E t = Y,D(Y^Y*-\t),Y l n (t + l)\\Q l (-\Y*-\t),Y l n (t + l)), t>0 (4.10) 
i=i 

and exploit that by (iii) of the Auxiliary Theorem 



J2 E t < D(Y n \\X n ). (4.11) 



t=o 
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5. Bounding D t by E t . 

Lemma 5.1. Under the DE(pi)8zED(pi,5) condition 

D t < (1 ~^ 2 -(Et^ + Et) for t>l. (5.1) 

Consequently, 

oo oo 

Dt < (1 - S) 2 ■ {E /2 + Y^E t )<(l-5) 2 - (D(Y n | \X n ) - E /2) . (5.2) 
t=i t=i 

Remark 1. We could not prove a recursion formula ensuring exponential decrease 
for either of the sequences (D t ), (E t ). But Lemma 5.1 is a good replacement, in 
force of the upper bound (2.5): £ -0°°E t < D(Y n \\X n ). 

Remark 2. In the next section we prove that lim^oo D(Z n (s) \\X n ) = 0. Antici- 
pating this convergence, (5.2), together with the Auxiliary Theorem, implies that 

D(Y n \\X n ) <D + (l-5) 2 - (D(Y n \\X n ) -E /2), 

i.e., 

(1 - (1 - d) 2 ) ■ D{Y n \\X n ) <D - ■ E . (5.3) 

For the proof of (5.1) we introduce the following 
Notation. Let (T, U, V) be a triple of random variables. We write 

to express the Markov relation 

T and V are conditionally independent given U. 

Using the above notation we state 
Lemma 5.2. 

Yi(t) -> (y i - 1 (t),i; n (t + l)) ^Y\t + l) for all t and i. (5.4) 



Lemma 5.2 follows at once from the definition of the Markov chain (Y n (t)). 
Moreover, we need a generalization of conditional relative entropy: 
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Notation. 

If Y and X are random variables with values in the same space, and distributions 
p resp. q, then £)(Y||X) will denote the relative entropy D(p\\q). Moreover, if 
we are given the conditional distributions p(-\V = v) = dist(Y|V = v), q(-\u) = 
dist(-X"|{7 = it) and the joint distribution tv = dist(E/, V), then for the average 
relative entropy 

E*D(Y\V = v\\X\U = u) 
we shall use either of the notations 

D(Y\U\\X\V), D(p(.\V)\\X\U), D(Y\U)\\q(.\V)), D(p(.\U)\\q(-\V)). 



Proof of Lemma 5. 1 . 

To prove (5.1), we are going to prove (later) and use now the following Markov 
relation: 

Zi(t) -> (Y^it-l)^^)) -> (Z i - 1 (t),Zf(t+l)) all t>l and all i. (5.5) 



Remark. The Markov relation (5.5) and its application (5.6) (below) are crucial in 
bounding Dt , and thereby in the proof of Theorem 1 . 

Relation (5.5) implies, by the convexity of the entropy functional, that for all 
t > 1 and all % 

D{Z % {t)\r-\t), Z?(t + l)\\QMZ l -\t), Z?{t + 1)) 
< DiZMY*- 1 ^ - 1), Y/^UQ^I^- 1 ^), Z?(t + 1)) 
= DiQMY^it - V^mQMZ 1 - 1 ^, Z?(t + 1)). (5.6) 



It follows that 

n 

DtKj^DiQi^Y'-^t-^^mQii-lZ^^^nt+l))) 
1=1 

< (1 ~^ 2 • E [d {n) (Y n (t - 1), Z n (t)) + d {n) (Y n (t),Z n (t + 1))] 2 . 

8 (5-7) 



where the second inequality follows from the entropy-distance bound (3.1 (ED)). 
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By the distance-entropy inequality this implies 
D t < - — 



Y,m(t-i)\Y^(t-i),YmmY^(t-i),Y?(t))) 
\ *=i 



+ 



j2mm i - l (t),Y-(t+i)\\Q(.\Y^(t),Y-(t+))) 
\ i=i 



We still have to prove the Markov relation (5.5). This will be proved as soon as 
we have shown that 



Yi(t - 1) - (Y^(t - 1), Y?(t)) - (Z^(t),Z?(t + 1)). 
By Lemma 4.1, for (5.8) it is enough to prove that 

Yi{t - 1) - (Y*-\t - 1), y»(*)) -> (z*- 1 ^), + 1)). 

Relation 

y*(* - 1) - (r 4 - 1 ^ - 1), r n (t)) - z*- 1 ^) 

follows from (4.8). Relation 

Y t (t - 1) - (y*- 1 ^ - 1), r n (t), z*- 1 ^)) - z 4 n (t + 1) 

follows from the Markov relation 

Y n (t - 1) -> F n (£) -> r n (t + 1) -> r n (t + 2), 



(5.8) 



(5.9) 



together with (4.8). Now (5.3), and thus the bound (5.1) is proved, and ( 5.2) follows 
using (4.11). □ 
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6. Bounding D - (1 - 5) 2 /2 ■ E . 
Recall that 

n 

D = J2D(Y\\Y' i -\z?(l)\\Q i (.\Y' i - 1 ,Z?(l)), 
i=i 

and 

n 

E = J2D(Y i \Y i - 1 ,Y?(l)\\Q i (-\Y i -\Y?(l)), 
i=i 

where Y n = Y n (0), and Y' n was defined by (4.5) and (4.6). 
Lemma 6.1. 



Do - Q-^- -e <2 .j^Dm-mm-im. (e.i) 



i=l 



(Recall that Y n = Y n (0).) 



Proof. 

By (4.5), and since Y' n -> y n -> Z n (l), we have 
whence 

D(y^|y' i -\zr(i)||Q i (.|y' i -\zf(i)))<D(y' i |y' i - 1 ,^||Q i (.|y' i - 1 ,zr(i))). 

(6.2) 

It is easy to check that the following identity holds true: 
D{Y\\Y' l -\Y?\m-\Y> l -\zni)) 

= D{Y\\Y' l -\Yr\\Q t ^Y' l -\Yn)+D{QMY' l -\Yn\\Q l ^Y' l -\zni))) 



+I / Pito ' Jy ''^"»' 1 ° s lf3) Ab ' J 



QAy',\Y"-\Z»(\)) 

The expectation in the last two lines is with respect to dist(y /n , Y n , Z n (l)). 
The last two lines in (6.3) can be respectively written in the following form: 



^(y^y^y/*) 
Q l (y,|y J - 1 ,^(i)) 



Elog VH . . ' ^ , (6.4) 
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and 

/ w/i— i 



E 



! Qd/ lY'*' 1 Y n ) lo- 9i^j\Kl I Y ^ d\ (V ) 



Q t (y' t \Y'*-\Z?(l)) 
Q i (r i |y'*- 1 ,Z?(l)) : 



Elog^^r^, (6.5) 



where the expectation is with respect to dist(y /n , Y n , Z n (l), r n ). The conditional 
distribution dist(ri|y /n , Y n , Z n (l)) is defined by the conditions 

dist(r l |y /i - 1 ,y l n ) = Q l , 

and 

is that coupling of 

dist(mr l -\i7) and qmy ,1 ~\yd 

that achieves VF 2 -distance for all values of the conditions. Thereby we have defined 
dist(ri|y\ Y™) and we assume that 

n -> (Y'\ Y?) -> everything else in (y' n , y n , Z n (l), r n ). 

Putting together (6.2-6.5): 
D(Y' i \Y' i -\z^(l)\\Q i (-\Y' i -\z-(l))) 

<D(n|y' l -\y-||Q l (-|y /l -\y^)+D(Q l (-|^-\y-)||Q l (.|^-\zr(i))) 



+ E 



log ^ log 



QiCy'iiy*- 1 , ^r(i)) QiCTiiy*- 1 , ^r(i)) ' 



= D(y / 4 |y / - 1 ,i7IIQ < (.|y /, - 1 ,i7)) + D(g 4 (.|y / 1 ,y/*)IIQi(-|^ 

+ E V(Y'\ Y?) - V(Y'\ Z?(l)) - V(Y' l -\ r<, Y?) + V(Y' l -\ n , Z?(l)) 
Summing for i: 

n 

E ^'^""Ufiiftt-rUrfi))) 

i=l 

n 

<Y,D{Y\\Y li -\Y?\\Q i {.\Y' i -\Yr)) 

i=l 
n 

DiQ^Y'^^nim-iY'*- 1 , znm 

n 

+ J2 E V(Y>\Yn -V(Y>\Z?(1)) -ViY' 1 - 1 ,^^) + V(Y n -\n,Z?(l)) 
i=i L (6J6) 
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By the DE{pi)&SQ(pi, 6) condition (6.6) can be continued to 

D < DiY'tlY'*- 1 , YfWQW'- 1 ^?)) + [ ~ ' ■ E(d^) 2 (Y n , Z n (l)) 



+ 



i=l 

1-5 



E(d^) 2 (Y n ,Z n (l)) ■ JE(d^) 2 (Y' n ,T n ). 



(6.7) 



By the definition of Y (4.5) we have 

n n 

^D(n|y' l -\y-||g,(-|y /l -\y-)) = ^D(y,|y l - 1 ,y-||g l (.|y i - 1 ,y-)). 



i=i 



i=i 



(6.8) 



Moreover, by the definition of r n , together with the distance-entropy bound 
(1.3(DE)) and (4.5): 

n 

E(d^) 2 (Y' n ,r n ) < 2-J2D(Y l \Y l -\Y l n \\Q l (-\Y t -\Y l n )). (6.9) 

i=l 



Further, by definition of Z n (l) and the distance-entropy bound: 

E(d (n) ) 2 (Y n ,Z n (l)) <2-E . 



(6.10) 



Substituting (6.8-6.10) into (6.7), we get 

D < Y,D(Yr\y l ~\ynm-\y l ~\yr)) + L ^- L ■ e 



i=l 



+ (l-6)-y/Eo 



Y.Dix^Y^^rwQMy 1 - 1 ^))- 

\ i=l 



(6.11) 



It follows that 



i=l 



+ (l-5)-yfEo~ 



Y^DiY^Y^^WQ^Y^^)). 
\ i=i 



(6.12) 



Maximizing in Eq the right-hand-side of (6.12) (which is quadratic in v^o) we 
get (6.1) □ 



Substituting (6.1) into (5.3) yields (1.8), the statement of Theorem 1. But To 
(1.8) be valid, we still have to prove the entropy convergence (2.2). 



BY ENTROPY OF LOCAL SPECIFICATIONS 



25 



7. Convergence in entropy 
Lemma 7.1. 

lim D(Z n (t)\\X n ) = 0. 



For the proof we shall need the concept of quadratic Wasserstein distance W{p n , q n ) = 
W p (p n , q n ) between measures on X n , and the fact that the Markov operator G de- 
fined by (4.1) (and defining the Gibbs sampler (Y n (t))) is a contraction with respect 
to this distance: 

Definition. 

W(p n ,q n ) = W p (p n ,q n ) = inf[Krf (n) (r n ,X n ) 2 ] 1/2 , 

where Y n and X n are random variables with laws p n resp. q n , and infimum is 
taken over all distributions n = dist(y n , X n ) with marginals p n and q n . Note that 
the VF-distance depends on the metric d and also on the numbers pi (since the p^s 
are present in the definition of er n )). For 1 < i < n we define similarly 

n 

W(p?,q?) = W p (p?,q?) = inf[E n £ d 2 {Y^ X^. 

j=i+i 



Lemma 7.2. 

If q n satisfies the contractivity bound (3. 8 (CO)) then the Markov kernel G is a 
contraction with respect to the W 2 -distance, with rate 
r{5) = (1 - S)/(l + 25- 5 2 ) 1/2 < 1- Consequently, 

W 2 (p n ,q n ) < C(5) ■ r(5) 2n ■ W 2 (p n ,p n ■ 67), 

where C(5) is a constant depending on 5. 

Proof of Lemma 7.2. 

Consider sequences y n ,z n G X n ', and define the random sequences U n ,T n with 
values in X n ', and distributions 

dist(t/ n |y n ) = G{-\y n ) and dist(T n \z n ) = G(-\z n ). 

Define a joining dist(£7 n , T n \y n , z n ) successively by taking for 

dist(t/ l ,T l |^,^,^ = <,T/ l =^) 

that joining of Qi(-|y l_1 , uf) and Qi(-|<2 1-1 , t™) that achieves VF 2 -distance. Thereby 
we have successively defined a joining of dist(£/" n |y n ) = G(-\y n ) and dist(T n |z n ) = 
G{-\z n ). Let 7r n = 7r n (-|y n ,z n ) = dist(t/ n , T n |y n , ^ n ) denote this joining. 
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We have 

E^d 2 (U t , T t \y n , z n , Up = u?,T? = f») = W 2 {Q t (-\y^\ «?), Q,(.|z*-\ *?))■ 
Thus the contractivity condition (3.8(CO)) implies 

n 

< (l-5) 2 [(^ n )) 2 (y n ,^ n )+E 7r n(^ n )) 2 (t/ n ,T n )]. 

Rearranging terms, we get 

E nn (d^) 2 (U\T n ) < 1 ^_ 1 ~^2 • (d {n) ) 2 (y n ,z n )- □ 

Proof of Lemma 7. 1 . 
We have 

n 

= Y^DiHw^mqMz^m (7.i) 
i=i 

where gi(-|x <_1 ) = dis^A^X 1 " 1 = a; 1 " 1 ). 

Relation (5.5) implies, by the convexity of the entropy functional, that for all 
t > 1 and all % 

D{Z % {t)\Z l -\t)U{-\Z^{t)) 

< DiZiWiy-^t - i),Yp(t)\m-\z % - l (t), s?(t + 1)) 

= DiQ^Y^it - \)^{t)\m-\Z^\t), S?(t + 1)), (7.2) 
where «Sf (t + 1) denotes a random sequence inX n ~ l : satisfying 

dist(S?(* + l)\Z l (t) = z l ) = = dist(Xf|X* = 2 l ). (7.3) 

In order to prove that the right-most side of (7.2) tends to as t — > oo, we want 
to use the entropy-distance bound (3.1 (ED)). Let us fix sequences y l , z l E X 1 . We 
are going to define a coupling 

n?(-\y\ z*) = dist(l?(f), S?(t + l)|y*(t - 1) = y l , Z\t) = z*) 

of the conditional distributions 

dist(y i n (t)|r l (t-l) =^,Z l (t) = z*), and dist(^ n (t + l)|Z i (t) = = 

(7.4) 
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such that 

l Y i (t-l) = y i ,Z i (t) = z*)} (7.5) 



J ^-E^tm^+i))) 



j=i+i 

be possibly small, where integration is with respect to dist(y*(t — 1), Z l (t)) 

To somewhat simplify notation in the conditional distributions, we shall tem- 
porarily write y % and z % in place of Y l (t — 1) = y % and Z l {t) = z l . 

With this notation we have to bound 

W^dtoOViW^Ui (•{*'))■ (7-6) 

In analogy with the Markov kernel G : X n \— > X n defined by (4.1), we can define 
the Markov kernel 

G?(-\z\-) : i-> X n ~\ 

using the local specifications of qf{-\z l ): 

n 

GUv?\z\u?)= J] Q d {v d \z\u\-\v-\ 
j=i+i 

I.e., the action of the Markov kernel G7"(-|zV) given condition uf is the same 
as that of the Markov kernel 67, given condition (z l uf), restricted to coordinates 
i < j <n. 

For z l fixed, q™(-\z l ) satisfies the contractivity bound (3.8(C))). Thus by Lemma 
7.2, the Markov kernel Gf(-\z l ) is a contraction with respect to the W r2 -distance, 
with at most the rate obtained for G. Thus we get the following bound for (7.6): 

W 2 (distil (t)\y\z^ q n-\^)) 

<C(5)-W 2 (dist(y/^^^^ (7.7) 

Consider the (partly random) sequence (z l , Y™(t)), and let n n (-\y l , z l ) denote its 
conditional distribution: 

V n (-\y\z i ) = dist(z\Y?(t)\z i ). 

(The marginal of // n (-|y \ z l ) for the coordinates up to i is concentrated on z ! .) We 
have 

dist(i7(t)|2A^) 

= marginal of dist(y n (t — l)\y l , z l ) ■ G for coordinates i < j < n, 
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= marginal of /j n (-\y l , z l ) ■ G for coordinates i < j < n. 

Thus 

W 2 (dist(Y^(t)\y\z i ),dist(Y^(t)\y\z i )-Gn-\z i )) 
< W 2 (dist(y*, Y?(t - 1) \y\ z l )^ n {-\y\ z 1 )) 
= W 2 (dist(y*, Y?{t - l)\y\ z l ), d\st{z\ Y?(i))\v\ z 1 )) 

i 

= J2Prd\y j ,z j ) + W 2 {dist(Y^(t-l)\y\z%dist(Y^(t))\y i ,z i )). 

3=1 (7.8) 

Substituting (7.8) into (7.7): 
W 2 (d\st(Y-(t)\y\z%q^-\z i )) 

- i 

< C{8) ■ Y^Pi ■d\y j ,z j ) + W 2 (dist(^(t- Adist(Y/^))|yV<)) 

(7.9) 

To estimate the second term in the square bracket in (7.9), we are going to define 
a good coupling of the conditional distributions 

dist(Y™(t-l)\Y l (t-l) = y\Z\t) = z l ) and dist(>7(t) \Y\t-l) = y\Z l {t) = z l ). 

(7-10) 

(This requires a tedious argument, since for dist(Y n (t — 1), Y n (t)), as we have 
defined it, ~Kddn(Y n (t — 1), Y n (t)) does not tend to 0.) We define a joint distribution 

dist^-l),^),^)) (7.11) 

that will satisfy 

dist«(t - 1), C(t)) = dist(Y n (t - 1), Z l (t)) (7.12) 

and 

distfaty - 1), vUt)X l (t)) = dist(y*(t - 1), *m Z\t)). (7.13) 
By (7.12) and (7.13), 

dist( V Ut -l),V?(t)W(t-l)X l (t)) (7.14) 
will realize the desired coupling of the conditional distributions (7.10). 
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We define the joint distribution (7.11) as follows. First we define 
dist(?7 n (t - 1), ?7f (t)). Put dist«(t - 1)) = dist(T n (t - 1)), and define recursively 
for i < k < n 

dist( Vk (t)\ V k -\t-l)^Ut)) = Qk^V k -\t-l), V Ut)). (7.15) 

Then define 

dist (r7 fc - 1 ) , 77 fe (t ) | r;^- 1 - 1 ) , ^ (*) ) 

as that coupling of 

dist(r7 fc (t- l)]^- 1 ^- I),r7^(t)) and Q fc (V _1 (f - !),»#(*)) 
that achieves VF 2 -distance for all values of conditions. Thereby we have defined 

dist(r7 fc (t)|77 fc (t- l),77^(t)) 
for all i < k < n. Postulating 

»!*(*) — (»!*(* - !),»!*(*)) — - 1), 

we have defined 

dist(r, n (t-l)X(t))- (7-16) 

Observe that we have 

distfa fc (f - l),Vk(t)) = dist(Y h (t-l),Y k n (t)) Kk<n. (7.17) 
The distance-entropy inequality (1.3(DE)), together with (7.17), implies 

i<j<n 

<2. D{ m {t-i)w-\t-i)^mQjW-Ht-iu]m 

i<j'<n 

= 2- D{Y j {t-l)\Yi-\t-l\Y?{t)\\QMY j -\t-^Y?m- 

i<j<n (7.18) 

Now we extend the joint distribution (7.16) to the joint distribution (7.11), 
setting 

dist (»/*(* - 1), V?(t),C(t)) = dist(r l (t - 1), Y?(t), Z\t)) (7.19) 

and 

C(t) - (Jit - l),V?(t)) - V?(t - 1). (7.20) 
(7.19-7.20) imply (7.12). (7.19-7.20) also imply (7.13), using the Markov relation 
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(the analog of (7.20) for dist(Y n (t - 1), Y?(t), Z i {€))). 
Thus we have 

W 2 {d\si{Yr{t-l)\y\z^A^iXr{t))\y\^)) < £ ^Pj-d 2 ( V j(t-l),Vj(t)), 

i<j<n 

(7.21) 

where the integration is with respect to dist(Y*(t — 1), Z l (t)). 

Substituting (7.21) into (7.9), and integrating with respect to dist(Y*(t— 1), Z l {t)) 
we get 



J W 2 (dist(Yr(t)\y\z*),qn-\z 1 )) 
<C(8) 



J2Pi-®d 2 (Y j (t-l),Z j (t))+ Pj&P(Tij(t-l)Mt)) 

j=l i<j<n 



(7.22) 



Substituting (7.18) into (7.22): 

J W^(di8tW(t)|y 4 ,z'), ? ?(.|z')) 



<C{5) 



^Pj-^iYjit-l)^^)) 



■j=i 



+2- J2 0(^(1 -i)\Y^-\t-i)^(t)\\QMy 3 -\t-^)^rm 

i<j<n J (7.23) 

By the distance-entropy bound, the right- hand-side of (7.23) is 

< 4 • C{8) • £? t -> as t -> oo. 

Thus we have found a coupling of the distributions (7.4) such that the integral (7.5) 
tends to 0, and therefore so do (7.2) and (7.1). Lemma 7.1 is proved. 
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